-
Notifications
You must be signed in to change notification settings - Fork 137
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
atmel_nand.c: Empty page error correction fixed. #19
atmel_nand.c: Empty page error correction fixed. #19
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add your signed-off-by:
and edit your commit message to match driver: nand: commit shortmessage
Not correcting anything in case of empty ECC data area is not an appropriate strategy, because an uncorrected bit-flip in an empty sector may cause upper layers (namely UBI) fail to work properly. Therefore the approach chosen in Linux kernel and other u-boot mtd drivers has been adopted, where a heuristic implemented by nand_check_erased_ecc_chunk() is used in order to detect and correct empty sectors. Signed-off-by: Kai Stuhlemmer (ebee Engineering) <[email protected]>
ba874cc
to
ff3c66d
Compare
Hi @ehristev, I change the commit message and added signed-off-by. I hope it is okay now. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey, @ksdd,
Sorry for the long delay, it's just now that I found time to look over this.
Your general approach is fine, and as you stated, it follows the linux implementation. A similar patch was done for linux at:
ff6ee101584c ("mtd: nand: atmel: correct bitflips in erased pages for pre-sama5d4 SoCs")
What can be improved is to factor out the following chunk:
-
dev_err(mtd->dev, "PMECC: Too many errors\n");
-
mtd->ecc_stats.failed++;
-
return -EBADMSG;
Don't worry about this, I can do this when applying.
I would like to understand more about your reasoning. I assume that you tested with sam9x60.
sam9x60 is capable of correcting empty pages, even though I see its discovered pmecc version is 0x102, which is smaller than PMECC_VERSION_SAMA5D4 (0x113). Thus, for the sam9x60 case, when you have less than CONFIG_PMECC_CAP bitflips, the code always goes through the else case and calls pmecc_correct_data(). When you have more that CONFIG_PMECC_CAP bitflips the code will superfluously call nand_check_erased_ecc_chunk(), as the ECC chunk was already checked internally in the hw.
9x60 works because you skipped the following code:
-
eccbytes = nand_chip->ecc.bytes;
-
for (i = 0; i < eccbytes; i++)
-
if (ecc[i] != 0xff)
-
goto normal_check;
-
/* Erased page, return OK */
-
return 0;
But your code should handle well the 5d3 case, which has a 0x112 pmecc version. Unfortunately, I think there's a bug on 5d3. Even when introducing bitflips, err_nbr is always 0, which is strange. I'll have to check that.
On which SoC did you test your code?
Btw, here's what I used for testing on sam9x60:
nand erase 0 1
nand read.raw 0x21000000 0 1
md.b 0x21000000 0x10e0
mm.b 0x21000000
md.b 0x21000000 0x10e0
nand write.raw 0x21000000 0 1
nand read 0x21000000 0 0x1000
md.b 0x21000000 0x1000
nand read.raw 0x21000000 0 1
md.b 0x21000000 0x10e0
Cheers,
ta
Hi @ambarus, our design uses a sam9x60d1g. Excerpt from dmesg:
Attached is a KOXIA TC58NVG1S3HBAI4 NAND flash. This flash requires 8-bit ECC and practical experience has shown, that it is prone to bit flips pretty much. That's why we discover all these NAND related issues ;-)
My practical tests had shown, that the pmecc handled empty page bit flips well, although this was not described as a feature for this version of pmecc. To double check and backup my observation I asked our Microchip FAE. The answer was:
With this feed-back I was happy to apply the patch and we didn't see any problems ever since. Feel free to refactor/improve my patch. Cheers, |
Hi, Kai, I've made it work for 5d3 as well. I'll send your patch on your behalf on u-boot mainline, and after it gets merged in our tree, we will close this PR. Thanks! |
err_nbr = nand_check_erased_ecc_chunk( | ||
buf_pos, host->pmecc_sector_size, | ||
ecc_pos, host->pmecc_bytes_per_sector, | ||
NULL, 0, (host->pmecc_corr_cap * 3) / 4); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why the last parameter of nand_check_erased_ecc_chunk() has value (host->pmecc_corr_cap * 3) / 4 ? Shouldn't this be just host->pmecc_corr_cap?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @ambarus,
You are right! Just host->pmecc_corr_cap is correct. To be honest I don't remember, where this calculation came from.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi, @ksdd!
No worries, I've already corrected it. Just FYI, I sent the patch on your behalf to u-boot mainline. Once it gets applied by Eugen, and also backported to our u-boot-at91, we'll close this PR.
Cheers,
ta
Patch available in u-boot-2021.04-at91. |
Not correcting anything in case of empty ECC data area
is not an appropriate strategy, because an uncorrected bit-flip
in an empty page may cause upper layers (namely UBI) fail to work
properly. Therefore the approach chosen in Linux kernel and other
u-boot mtd drivers has been adopted, where a heuristic implemented
by nand_check_erased_ecc_chunk() is used in order to detect and
correct empty pages.